智能论文笔记

Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search

Wenqing Zheng , S P Sharan , Zhiwen Fan , Kevin Wang , Yihan Xi , Zhangyang Wang

分类：机器学习 | 人工智能

2022-12-30

Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes. While neural networks have achieved competitive performance, the resulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More recent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle both policy learning and symbolic planning. However, these approaches rely on coded primitives with little feature learning, and when applied to high-dimensional visual scenes, they can suffer from scalability issues and perform poorly when images have complex object interactions. To address these challenges, we propose \textit{Differentiable Symbolic Expression Search} (DiffSES), a novel symbolic learning approach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization. Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more and scalable than state-of-the-art symbolic RL methods, with a reduced amount of symbolic prior knowledge.

translated by 谷歌翻译

A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care: Choosing the Best Model for COVID-19 Prognosis

Junyi Gao , Yinghao Zhu , Wenqing Wang , Yasha Wang , Wen Tang , Liantao Ma

分类：机器学习

2022-09-16

COVID-19大流行对全球医疗保健系统造成了沉重的负担，并造成了巨大的社会破坏和经济损失。已经提出了许多深度学习模型来执行临床预测任务，例如使用电子健康记录（EHR）数据在重症监护病房中为Covid-19患者的死亡率预测。尽管在某些临床应用中取得了最初的成功，但目前缺乏基准测试结果来获得公平的比较，因此我们可以选择最佳模型以供临床使用。此外，传统预测任务的制定与重症监护现实世界的临床实践之间存在差异。为了填补这些空白，我们提出了两项临床预测任务，特定于结局的预测和重症监护病房中的COVID-19患者的早期死亡率预测。这两个任务是根据幼稚的停车时间和死亡率预测任务的改编，以适应COVID-19患者的临床实践。我们提出了公平，详细的开源数据预处管道，并评估了两项任务的17个最先进的预测模型，包括5个机器学习模型，6种基本的深度学习模型和6种专门为EHR设计的深度学习预测模型数据。我们使用来自两个现实世界Covid-19 EHR数据集的数据提供基准测试结果。这两个数据集都可以公开可用，而无需任何查询，并且可以根据要求访问一个数据集。我们为两项任务提供公平，可重复的基准测试结果。我们在在线平台上部署所有实验结果和模型。我们还允许临床医生和研究人员将其数据上传到平台上，并使用训练有素的模型快速获得预测结果。我们希望我们的努力能够进一步促进Covid-19预测建模的深度学习和机器学习研究。

translated by 谷歌翻译

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

Haoran Zhou , Yun Cao , Wenqing Chu , Junwei Zhu , Tong Lu , Ying Tai , Chengjie Wang

分类：计算机视觉

2022-07-21

在3D点云的一代任务中，点云完成越来越流行，因为从其部分观察结果中恢复了3D对象的完整形状是一个具有挑战性但必不可少的问题。在本文中，我们提出了一种新型的种子形式，以提高点云完成中细节保存和恢复的能力。与以前的基于全局特征向量的方法不同，我们引入了一种新的形状表示形式，即补丁种子，不仅可以从部分输入中捕获一般结构，而且还保留了本地模式的区域信息。然后，通过将种子特征集成到生成过程中，我们可以以粗到精细的方式恢复忠实的细节，以获取完整的点云。此外，我们通过将变压器结构扩展到点发生器的基本操作来设计上样本变压器，该结构有效地结合了相邻点之间的空间和语义关系。定性和定量评估表明，我们的方法在多个基准数据集上优于最先进的完成网络。我们的代码可从https://github.com/hrzhou2/seedformer获得。

translated by 谷歌翻译

CFNet: Learning Correlation Functions for One-Stage Panoptic Segmentation

Yifeng Chen , Wenqing Chu , Fangfang Wang , Ying Tai , Ran Yi , Zhenye Gan , Liang Yao , Chengjie Wang , Xi Li

分类：计算机视觉

2022-01-13

最近，在一步的Panoptic细分方法上越来越关注，旨在有效地旨在在完全卷积的管道内共同分割实例和材料。但是，大多数现有的工作直接向骨干功能提供给各种分段头，忽略语义和实例分割的需求不同：前者需要语义级别的判别功能，而后者需要跨实例可区分的功能。为了缓解这一点，我们建议首先预测用于增强骨干特征的不同位置之间的语义级和实例级相关性，然后分别将改进的鉴别特征馈送到相应的分割头中。具体地，我们将给定位置与所有位置之间的相关性组织为连续序列，并将其预测为整体。考虑到这种序列可以非常复杂，我们采用离散的傅里叶变换（DFT），一种可以近似由幅度和短语参数化的任意序列的工具。对于不同的任务，我们以完全卷积的方式从骨干网上生成这些参数，该参数通过相应的任务隐含地优化。结果，这些准确和一致的相关性有助于产生符合复杂的Panoptic细分任务的要求的合理辨别特征。为了验证我们的方法的有效性，我们对几个具有挑战性的Panoptic细分数据集进行实验，并以45.1美元\％PQ和ADE20K为32.6美元\％PQ实现最先进的绩效。

translated by 谷歌翻译

Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

Xiang Bai , Hanchen Wang , Liya Ma , Yongchao Xu , Jiefeng Gan , Ziwei Fan , Fan Yang , Ke Ma , Jiehua Yang , Song Bai

分类：人工智能

2021-11-18

人工智能（AI）为简化Covid-19诊断提供了有前景的替代。然而，涉及周围的安全和可信度的担忧阻碍了大规模代表性的医学数据，对临床实践中训练广泛的模型造成了相当大的挑战。为了解决这个问题，我们启动了统一的CT-Covid AI诊断计划（UCADI），其中AI模型可以在没有数据共享的联合学习框架（FL）下在每个主机机构下分发和独立地在没有数据共享的情况下在每个主机机构上执行。在这里，我们认为我们的FL模型通过大的产量（中国测试敏感性/特异性：0.973 / 0.951，英国：0.730 / 0.942），与专业放射科医师的面板实现可比性表现。我们进一步评估了持有的模型（从另外两家医院收集，留出FL）和异构（用造影材料获取）数据，提供了模型所做的决策的视觉解释，并分析了模型之间的权衡联邦培训过程中的性能和沟通成本。我们的研究基于来自位于中国和英国的23家医院的3,336名患者的9,573次胸部计算断层扫描扫描（CTS）。统称，我们的工作提出了利用联邦学习的潜在保留了数字健康的前景。

translated by 谷歌翻译

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Wenqing Zheng , Edward W Huang , Nikhil Rao , Sumeet Katariya , Zhangyang Wang , Karthik Subbian

分类：机器学习

2021-11-08

图形神经网络（GNNS）在节点分类，回归和推荐任务中取得了最新的最新性能。当可提供高质量和丰富的连接结构时，GNNS工作好。但是，在许多真实世界图中，该要求在节点度具有幂律分布的许多真实世界中，因为许多节点具有较少或嘈杂的连接。这种情况的极端情况是节点可能没有邻居，称为严格的冷启动（SCS）场景。这会强制预测模型依赖于节点的输入特征。与通过蒸馏方法相比，我们提出冷啤酒以解决SCS和嘈杂的邻居设置。我们介绍了功能贡献比（FCR），测量使用电感GNN解决SCS问题的可行性，并选择SCS泛化的最佳体系结构。我们通过实验显示FCR Disentangles图数据集的各种组成部分的贡献，并展示了几个公共基准和专有电子商务数据集上的冷啤酒的优越性。我们方法的源代码可用于：https://github.com/amazon-research/gnn-tail-一致化。

translated by 谷歌翻译

Scalable Perception-Action-Communication Loops with Convolutional and Graph Neural Networks

Ting-Kuei Hu , Fernando Gama , Tianlong Chen , Wenqing Zheng , Zhangyang Wang , Alejandro Ribeiro , Brian M. Sadler

分类：机器人 | 机器学习

2021-06-24

在本文中，我们使用基于视觉的图形聚合和推理（VGAI）呈现了一种感知 - 动作通信环路设计。这种多代理分散的学习 - 控制框架将原始的视觉观测映射到代理操作，并通过相邻代理之间的本地通信提供帮助。我们的框架是由圆形卷积和图形神经网络（CNN / GNN）的级联实现，寻址代理级视觉感知和特征学习，以及群级通信，本地信息聚合和代理动作推断。通过联合训练CNN和GNN，结合了解图像特征和通信消息以更好地解决特定任务。我们使用模仿学习在离线阶段训练VGAI控制器，依赖于集中式专家控制器。这导致学习的VGAI控制器可以以分布式方式部署以进行在线执行。此外，控制器展示了良好的缩放性质，在较大的团队中具有较小的团队和应用程序的培训。通过多代理植入应用程序，我们证明VGAI产生与其他分散的控制器相当或更好地使用视觉输入模态，而不访问精确的位置或运动状态信息。

translated by 谷歌翻译

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

Chuhui Xue , Jiaxing Huang , Wenqing Zhang , Shijian Lu , Changhu Wang , Song Bai

分类：计算机视觉

2021-05-18

Leveraging the advances of natural language processing, most recent scene text recognizers adopt an encoder-decoder architecture where text images are first converted to representative features and then a sequence of characters via `sequential decoding'. However, scene text images suffer from rich noises of different sources such as complex background and geometric distortions which often confuse the decoder and lead to incorrect alignment of visual features at noisy decoding time steps. This paper presents I2C2W, a novel scene text recognition technique that is tolerant to geometric and photometric degradation by decomposing scene text recognition into two inter-connected tasks. The first task focuses on image-to-character (I2C) mapping which detects a set of character candidates from images based on different alignments of visual features in an non-sequential way. The second task tackles character-to-word (C2W) mapping which recognizes scene text by decoding words from the detected character candidates. The direct learning from character semantics (instead of noisy image features) corrects falsely detected character candidates effectively which improves the final text recognition accuracy greatly. Extensive experiments over nine public datasets show that the proposed I2C2W outperforms the state-of-the-art by large margins for challenging scene text datasets with various curvature and perspective distortions. It also achieves very competitive recognition performance over multiple normal scene text datasets.

translated by 谷歌翻译

FIS-GAN: GAN with Flow-based Importance Sampling

Shiyu Yi , Donglin Zhan , Wenqing Zhang , Denglin Jiang , Kang An , Hao Wang

分类：机器学习 | (统计)机器学习

2019-10-06

Generative Adversarial Networks (GAN) training process, in most cases, apply Uniform or Gaussian sampling methods in the latent space, which probably spends most of the computation on examples that can be properly handled and easy to generate. Theoretically, importance sampling speeds up stochastic optimization in supervised learning by prioritizing training examples. In this paper, we explore the possibility of adapting importance sampling into adversarial learning. We use importance sampling to replace Uniform and Gaussian sampling methods in the latent space and employ normalizing flow to approximate latent space posterior distribution by density estimation. Empirically, results on MNIST and Fashion-MNIST demonstrate that our method significantly accelerates GAN's optimization while retaining visual fidelity in generated samples.

translated by 谷歌翻译

PV3D: A 3D Generative Model for Portrait Video Generation

Eric Zhongcong Xu , Jianfeng Zhang , Jun Hao Liew , Wenqing Zhang , Song Bai , Jiashi Feng , Mike Zheng Shou

分类：计算机视觉

2022-12-13

Recent advances in generative adversarial networks (GANs) have demonstrated the capabilities of generating stunning photo-realistic portrait images. While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos. In this work, we propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos. Specifically, our method extends the recent static 3D-aware image GAN to the video domain by generalizing the 3D implicit neural representation to model the spatio-temporal space. To introduce motion dynamics to the generation process, we develop a motion generator by stacking multiple motion layers to generate motion features via modulated convolution. To alleviate motion ambiguities caused by camera/human motions, we propose a simple yet effective camera condition strategy for PV3D, enabling both temporal and multi-view consistent video generation. Moreover, PV3D introduces two discriminators for regularizing the spatial and temporal domains to ensure the plausibility of the generated portrait videos. These elaborated designs enable PV3D to generate 3D-aware motion-plausible portrait videos with high-quality appearance and geometry, significantly outperforming prior works. As a result, PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing. Code and models will be released at https://showlab.github.io/pv3d.

translated by 谷歌翻译